The Holy Grail of Sense Definition: Creating a Sense-Disambiguated Corpus from Scratch

نویسندگان

  • Anna Rumshisky
  • Marc Verhagen
  • Jessica L. Moszkowicz
چکیده

This paper presents a methodology for creating a gold standard for sense definition using Amazon’s Mechanical Turk service. We demonstrate how this method can be used to create in a single step, quickly and cheaply, a lexicon of sense inventories and the corresponding sense-annotated lexical sample. We show the results obtained by this method for a sample verb and discuss how it can be improved to produce an exhaustive lexical resource. We then describe how such a resource can be used to further other semantic annotation efforts, using as an example the Generative Lexicon Mark-up Language (GLML) effort.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Iterative Approach to Word Sense Disambiguation

In this paper, we present an iterative algorithm for Word Sense Disambiguation. It combines two sources of information: Word_Net and a semantic tagged corpus, for the purpose of identifying the correct sense of the words in a given text. It differs from other standard approaches in that the disambiguation process is performed in an iterative manner: starting from free text, a set of disambiguat...

متن کامل

Case study of BushBank concept

In this paper, we present a new type of annotated corpus, called BushBank, which improves handling of ambiguity in natural language. Unlike in traditional approaches where data are directly disambiguated, in a BushBank, disambiguation is done later, based on application needs. This has major impact on the structures used in the corpus, since ordinary syntactic trees disallow ambiguity. Our appr...

متن کامل

A Korean Homonym Disambiguation System Based on Statistical Model Using Weights

A homonym could be disambiguated by another words in the context as nouns, predicates used with the homonym. This paper using semantic information (co-occurrence data) obtained from definitions of part of speech (POS) tagged UMRD-S 1 ). In this research, we have analyzed the result of an experiment on a homonym disambiguation system based on statistical model, to which Bayes' theorem is applied...

متن کامل

Analyzing the concept of sense of place and its effect on the identity of place in new cities

The purpose of current article is to study the effect of sense of place on creating the urban identity in new cities. In order to fulfill the mentioned purpose, the key concepts of research and the theories related to the issue were studied. According to it, in order to collect the data, library method and note taking tool were used. The studies indicated the presence of human in the environmen...

متن کامل

Wikicorpus: A Word-Sense Disambiguated Multilingual Wikipedia Corpus

This article presents a new freely available trilingual corpus (Catalan, Spanish, English) that contains large portions of the Wikipedia and has been automatically enriched with linguistic information. To our knowledge, this is the largest such corpus that is freely available to the community: In its present version, it contains over 750 million words. The corpora have been annotated with lemma...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009